temporal condition
- Europe > United Kingdom > England > Staffordshire (0.04)
- Europe > Middle East > Republic of Türkiye > Istanbul Province > Istanbul (0.04)
- Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)
- Asia > Middle East > Republic of Türkiye > Istanbul Province > Istanbul (0.04)
- Information Technology > Artificial Intelligence > Vision (1.00)
- Information Technology > Artificial Intelligence > Natural Language (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
- Information Technology > Sensing and Signal Processing > Image Processing (0.68)
- Information Technology > Artificial Intelligence > Vision (1.00)
- Information Technology > Artificial Intelligence > Natural Language (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
- Information Technology > Sensing and Signal Processing > Image Processing (0.68)
Smooth-Foley: Creating Continuous Sound for Video-to-Audio Generation Under Semantic Guidance
Zhang, Yaoyun, Xu, Xuenan, Wu, Mengyue
The video-to-audio (V2A) generation task has drawn attention in the field of multimedia due to the practicality in producing Foley sound. Semantic and temporal conditions are fed to the generation model to indicate sound events and temporal occurrence. Recent studies on synthesizing immersive and synchronized audio are faced with challenges on videos with moving visual presence. The temporal condition is not accurate enough while low-resolution semantic condition exacerbates the problem. To tackle these challenges, we propose Smooth-Foley, a V2A generative model taking semantic guidance from the textual label across the generation to enhance both semantic and temporal alignment in audio. Two adapters are trained to leverage pre-trained text-to-audio generation models. A frame adapter integrates high-resolution frame-wise video features while a temporal adapter integrates temporal conditions obtained from similarities of visual frames and textual labels. The incorporation of semantic guidance from textual labels achieves precise audio-video alignment. We conduct extensive quantitative and qualitative experiments. Results show that Smooth-Foley performs better than existing models on both continuous sound scenarios and general scenarios. With semantic guidance, the audio generated by Smooth-Foley exhibits higher quality and better adherence to physical laws.
S2DM: Sector-Shaped Diffusion Models for Video Generation
Lang, Haoran, Ge, Yuxuan, Tian, Zheng
Diffusion models have achieved great success in image generation. However, when leveraging this idea for video generation, we face significant challenges in maintaining the consistency and continuity across video frames. This is mainly caused by the lack of an effective framework to align frames of videos with desired temporal features while preserving consistent semantic and stochastic features. In this work, we propose a novel Sector-Shaped Diffusion Model (S2DM) whose sector-shaped diffusion region is formed by a set of ray-shaped reverse diffusion processes starting at the same noise point. S2DM can generate a group of intrinsically related data sharing the same semantic and stochastic features while varying on temporal features with appropriate guided conditions. We apply S2DM to video generation tasks, and explore the use of optical flow as temporal conditions. Our experimental results show that S2DM outperforms many existing methods in the task of video generation without any temporal-feature modelling modules. For text-to-video generation tasks where temporal conditions are not explicitly given, we propose a two-stage generation strategy which can decouple the generation of temporal features from semantic-content features. We show that, without additional training, our model integrated with another temporal conditions generative model can still achieve comparable performance with existing works. Our results can be viewd at https://s2dm.github.io/S2DM/.
Instructed Diffuser with Temporal Condition Guidance for Offline Reinforcement Learning
Hu, Jifeng, Sun, Yanchao, Huang, Sili, Guo, SiYuan, Chen, Hechang, Shen, Li, Sun, Lichao, Chang, Yi, Tao, Dacheng
Recent works have shown the potential of diffusion models in computer vision and natural language processing. Apart from the classical supervised learning fields, diffusion models have also shown strong competitiveness in reinforcement learning (RL) by formulating decision-making as sequential generation. However, incorporating temporal information of sequential data and utilizing it to guide diffusion models to perform better generation is still an open challenge. In this paper, we take one step forward to investigate controllable generation with temporal conditions that are refined from temporal information. We observe the importance of temporal conditions in sequential generation in sufficient explorative scenarios and provide a comprehensive discussion and comparison of different temporal conditions. Based on the observations, we propose an effective temporally-conditional diffusion model coined Temporally-Composable Diffuser (TCD), which extracts temporal information from interaction sequences and explicitly guides generation with temporal conditions. Specifically, we separate the sequences into three parts according to time expansion and identify historical, immediate, and prospective conditions accordingly. Each condition preserves non-overlapping temporal information of sequences, enabling more controllable generation when we jointly use them to guide the diffuser. Finally, we conduct extensive experiments and analysis to reveal the favorable applicability of TCD in offline RL tasks, where our method reaches or matches the best performance compared with prior SOTA baselines.
- North America > United States > Maryland > Prince George's County > College Park (0.14)
- North America > United States > Pennsylvania > Northampton County > Bethlehem (0.04)
- North America > United States > Montana (0.04)
- (3 more...)
Representation learning of rare temporal conditions for travel time prediction
Petersen, Niklas, Rodrigues, Filipe, Pereira, Francisco
Predicting travel time under rare temporal conditions (e.g., public holidays, school vacation period, etc.) constitutes a challenge due to the limitation of historical data. If at all available, historical data often form a heterogeneous time series due to high probability of other changes over long periods of time (e.g., road works, introduced traffic calming initiatives, etc.). This is especially prominent in cities and suburban areas. We present a vector-space model for encoding rare temporal conditions, that allows coherent representation learning across different temporal conditions. We show increased performance for travel time prediction over different baselines when utilizing the vector-space encoding for representing the temporal setting.
- Europe > Denmark > Capital Region > Copenhagen (0.04)
- Europe > Denmark > Capital Region > Kongens Lyngby (0.04)
- Asia > China > Shandong Province > Jinan (0.04)
Temporal Subtyping of Alzheimer's Disease Using Medical Conditions Preceding Alzheimer's Disease Onset in Electronic Health Records
He, Zhe, Tian, Shubo, Erdengasileng, Arslan, Charness, Neil, Bian, Jiang
Subtyping of Alzheimer's disease (AD) can facilitate diagnosis, treatment, prognosis and disease management. It can also support the testing of new prevention and treatment strategies through clinical trials. In this study, we employed spectral clustering to cluster 29,922 AD patients in the OneFlorida Data Trust using their longitudinal EHR data of diagnosis and conditions into four subtypes. In addition, according to the results of various statistical tests, these subtypes are also significantly different with respect to demographics, mortality, and prescription medications after the AD diagnosis. This study could potentially facilitate early detection and personalized treatment of AD as well as data-driven generalizability assessment of clinical trials for AD. Introduction Alzheimer's disease (AD) is a progressive neurodegenerative disorder that affects an estimated 6.2 million Americans age 65 and older in 2021. This number is likely to reach 13.8 million by 2060.
- North America > United States > Florida > Alachua County > Gainesville (0.14)
- North America > United States > Alaska (0.05)
- North America > United States > Florida > Leon County > Tallahassee (0.04)
- (4 more...)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (1.00)